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Organizational  Climate: 

Another  Look  at  a  Potentially  Important  Construct 

In  a  recent  discussion  of  measurement  models  in  climate  research,  James 
(1982)  recommended  that  a  decision  of  whether  to  aggregate  individuals' 
climate  scores  should  be  a  function  of  the  magnitude  of  an  intraclass  correla¬ 
tion  estimate  of  interrater  reliability.  This  recommendation  was  based  on  the 
following  rationale:  (a)  the  basic  unit  of  theory  (unit  of  analysis)  for 
climate  is  individuals'  perceptions  of  their  psychological  climate  (James  & 

Sells,  1981;  Jones  &  James,  1979;  Joyce  &  Slocum,  1979;  Schneider,  in  press); 

(b)  a  composition  theory  relating  psychological  climate  scores  to  aggregate 
psychological  climate  scores  (e.g.,  organizational  climate  scores)  may  be 
established  if  the  perceptions  of  psychological  climate  are  shared  among  the 
individuals  on  whom  the  aggregate  is  computed  (Roberts,  Hulin,  &  Rousseau,  1978); 
and  (c)  the  typical  design  employed  in  climate  studies  is  a  random  effects,  one¬ 
way  analysis  of  variance  (ANOVA) ,  from  which,  given  reasonable  satisfaction 
of  assumptions,  it  is  possible  to  estimate  interrater  reliability  (perceptual 
agreement,  degree  perceptions  are  shared)  by  the  intraclass  correlation  equation 
for  the  reliability  of  a  single  rating  or  measurement  [referred  to  here  as 
ICC(l) — cf.  Bartko,  1976;  Ebel,  1951;  Shrout  &  Fleiss,  1979;  Winer,  1971). 

The  objectives  of  the  present  paper  represent,  in  part,  a  continuation  of 
the  discussion  above.  It  is  suggested  that  the  criterion  for  perceptual 
agreement  and  aggregation  of  psychological  climate  scores  is  a  reasonably  high 
ICC(l).  Based  on  this  criterion,  it  is  shown  that  legitimate  indices  of  inter¬ 
rater  reliability  render  organizational  climate  a  moot  issue,  where  the  term 
organizational  climate  is  used  to  refer  to  a  field  of  research  which  involves 
any  type  of  aggregate  psychological  climate  scores  (Jones  &  James,  1979; 

Schneider,  in  press).  It  is  then  demonstrated  that  estimates  of  interrater 
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reliability  based  on  an  ICC(l)  approach  may,  under  specified  conditions, 
furnish  serious  underestimates  of  interrater  reliability.  Finally,  a  new  method 
for  estimating  interrater  reliability  is  overviewed,  and  an  empirical  illustra¬ 
tion  is  used  to  show  that  it  is  possible  to  achieve  high  levels  of  interrater 
reliability  on  climate  data.  The  conclusion  reached  is  that  organizational 
climate  is  a  salvageable  construct. 

A  Criterion  for  Perceptual  Agreement  in  Climate  Research 

Climate  has  been  reviewed  extensively  in  recent  years,  the  output  focused 
mainly  on  restatements  of  prior  positions,  reviews  of  these  positions,  and 
reviews  of  reviews  (Campbell,  Dunnette,  Lawler,  &  Weick,  1970;  Hellriegel  & 
Slocum,  1974;  Insel  &  Moos,  1974;  James,  Hater,  Gent,  &  Bruni,  1978;  James  & 
Jones,  1974;  James  &  Sells,  1981;  Jones  &  James,  1979;  Joyce  &  Slocum,  1979; 
Naylor,  Pritchard,  &  Ilgen,  1980;  Payne,  Fineman,  &  Wall,  1976;  Payne  &  Pugh, 
1976;  Powell  &  Butterfield,  1978;  Schneider,  1975,  in  press;  Schneider,  Parking- 
ton,  &  Buxton,  1980;  Woodman  &  King,  1978).  Represented  ubiquitously  in  these 
reviews  is  the  logic  that  perceptual  agreement  should  precede  aggregation  of 
climate  scores.  Yet,  a  criterion  for  an  acceptable  level  of  perceptual  agreement 
that  is,  a  level  that  justifies  aggregation — remains  obscure  (James,  1982). 
Exceptions  to  this  rule  include  Guion  (1973) ,  who  recommended  that  agreement 
indices  should  not  depart  significiantly  from  1.00,  and  Roberts  et  al,.  (1978) 
and  Schneider  (in  press),  who  recommended  that  within-organization  variance  in 
climate  perceptions  should  be  small  in  relation  to  between-organization  variance. 
The  fact  that  these  two  recommendations  are  statistically  miles  apart  is  easily 
demonstrated  if  we  apply  their  statistical  implications  to  the  typical 
experimental  design  used  in  climate  research. 

Suppose  that  we  have  n^  individuals  nested  in  each  of  K  (k=l,...,K)  organ¬ 
izations.  For  the  present,  it  is  presumed  that  the  assumptions  for  a  one-way 


random  effects  ANOVA  and  computation  of  an  ICC(l)  have  been  reasonably  satisfied 
(e.g.,  randomly  selected  organizations  and  individuals,  homogeneity  of 
variance).  The  ANOVA  employs  the  K  organizations  as  treatments  and  the 

scores  on  a  climate  variable  in  each  organization  as  values  on  the  dependent 
variable.  The  "empirical  criterion"  for  agreement  for  Roberts  et  al .  (1978) 
and  Schneider  (in  press)  appears  to  be  a  significant  _F  ratio,  which  connotes 
significantly  greater  between-organization  variance  than  within-organization 
variance.  Note  that  no  point  estimate  of  interrater  reliability  is  required. 

This  suggests,  for  example,  that  with  large  samples  an  ICC(l)  of  .05  is  acceptable 
as  long  as  the  ratio  is  significant  [cf .  Jones  &  James,  1979  for  point  estimates 
of  ICC(l)  with  large  samples].  The  Guion  (1973)  criterion  is  much  more  stringent. 
It  implies,  for  example,  that  in  each  of  the  K  organizations  the  variance  on  the 
climate  variable  should  not  depart  significantly  from  zero  (see  Li,  1964  for  a 
chi-square  test  for  variances).  This  suggests  that  not  only  should  the  F  ratio 
be  significant,  but  also  that  the  ICC(l)  should  approach  1.00. 

The  position  advocated  in  this  paper  is  that  a  criterion  for  perceptual 
agreement  requires  first  a  point  estimate  of  interrater  reliability.  To  demon¬ 
strate  merely  that  an  _F  ratio  is  significant  is  of  trivial  concern  in  relation 
to  the  magnitude  of  the  interrater  reliability  estimate,  especially  when  N(N-).n^) 

is  large  (Cohen,  1960).  Thus,  the  Roberts  et  al.  (1978)  and  Schneider  (in  press) 
criterion  is  not  regarded  as  sufficient  for  justifying  aggregation  of  climate 
scores.  The  implied  necessity  for  a  point  estimate  of  interrater  reliability 
approaching  1.00  (Guion,  1973)  is  regarded  as  too  stringent.  Consider,  for 
example,  that  the  conventional  criterion  for  computing  an  aggregate  over  items-- 
that  is,  a  composite  score  for  each  individual  on  items  designed  to  assess  the 
same  construct — is  an  internal  consistency  reliability  (e.g.,  coefficient  <)  of 


$ 
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.70  and  above  (in  exploratory  studies).  If  we  were  to  require  a's  approaching 
1.00,  few  item  composites  would  be  computed.  I  submit  that  the  same  conventional 
criterion  can  be  used  as  a  criterion  for  interrater  reliability  and  aggregation 
of  climate  scores  over  individuals.  Specifically,  given  the  design  in  question, 
it  is  recommended  that  an  ICC(l)  of  .70  should  be  employed  as  a  lower-bound 
criterion  for  justifying  aggregation  of  climate  scores  over  individuals. 

Is  Organizational  Climate  a  Moot  Issue? 

James  (1982)  summarized  estimates  of  perceptual  agreement  in  climate  studies 
and  reported  that  the  range  of  estimates  varied  from  .00  to  .50,  with  a  median 
of  approximately  .12.  The  estimates  included  in  the  summary  were  based  on 
either  ICC(l)  or  estimates  of  the  proportion  of  variance  in  individuals'  percep¬ 
tions  associated  with  variation  among  environments  (eta-squares,  omega-squares) . 
For  reasons  explained  in  that  article,  estimates  based  on  aggregates  were 
considered  biased  and  excluded  from  the  summary.  Also  excluded  were  estimates 
of  interrater  reliability  based  on  correlations  among  profiles  (e.g.,  a  correla¬ 
tion  between  two  raters'  scores  on  a  set  of  climate  dimensions)  for  reasons 
discussed  by  numerous  authors  (cf.  Cronbach  &  Gleser,  1953),  and  a  study  by 
Howe  (1977),  which  confounded  stability  of  perceptions  over  time  with  agreement 
among  perceptions  at  a  particular  point  in  time. 

Given  that  legitimate  estimates  of  interrater  reliability  do  not  exceed  .50, 
it  follows  if  we  were  to  adopt  a  point  estimate  of  interrater  reliability  equal 
to  or  greater  than  .70  as  the  operational  criterion  for  perceptual  agreement 
and  aggregation,  then  organizational  climate  as  presently  conceived  is  a  moot 
issue.  Or  is  it? 

Appropriateness  of  the  Intraclass  Correlation  in  Climate  Studies 

The  objective  of  this  section  is  to  suggest  that  the  intraclass  correlation, 
and  other  statistics  that  employ  a  between-group  versus  within-group  form  of 
design  (eta-square,  omega-square),  may  have  provided  substantial  underestimates 
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of  interrater  reliability  in  at  least  some  prior  climate  studies.  Discussion 
focuses  on  ICC(l),  and  is  based  on  a  recent  statistical  paper  by  James, 

Wolf,  and  Demaree  (Note  1). 

Associated  with  the  ICC(l)  statistic  are  a  number  of  assumptions  underlying 
the  ANOVA  procedure  on  which  it  is  based.  One  assumption  is  that  the  environments 
employed  in  a  study  comprise  a  random  sample  of  environments  from  a  heterogeneous 
population  of  environments.  The  somewhat  subtle  implication  of  this  assumption 
is  that  if  a  mean  (aggregate)  climate  score  is  computed  over  the  n^  individuals 

in  each  environment,  then  these  means  will  vary  among  environments,  especially 
in  a  condition  of  high  interrater  reliability.  To  be  specific,  be tween-environment 
variance  in  mean  climate  perceptions  is  a  prerequisite  for  high  interrater 
reliability.  Now,  consider  the  statistical  facts  that  if  (a)  little  variation 
exists  among  the  K  mean  climate  perceptions  for  K  environments,  and  if  (b) 
perceivers  in  each  of  the  J<  environments  agree  almost  perfectly  (i.e.,  within- 
environment  variance  is  close  to  zero),  then  (c)  the  ICC (11  estimate  of  inter¬ 
rater  reliability  will  be  low.  Note  that  we  have  a  condition  of  almost  perfect 
agreement  within  environments  and  an  estimate  of  interrater  reliability  that 
is  conceivably  zero  (or  even  negative  in  value).  In  effect,  this  is  the 
restriction  of  range  problem  extended  to  estimates  of  interrater  reliability, 
where  by  restriction  of  range  is  meant  little  or  no  variation  among  the  mean 
climate  perceptions  over  environments.  (The  same  logic  applies  to  eta-square 
and  omega-square,  although  these  statistics  may  themselves  differ;  cf.  Maxwell, 
Camp,  &  Arvey,  1981). 

These  points  are  easily  illustrated  statistically.  The  data  presented  in 
Table  1  consist  of  hypothetical  scores  on  a  climate  item  X  which  has  five 
discrete,  approximately  equally  spaced  alternatives  (e.g.,  a  Likert  scale — cf. 
Cooper,  1976;  Hsu,  1979).  Frequencies  of  responses  are  shown  for  20  different 
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individuals  in  each  of  two  environments.  The  frequencies  of  responses  indicate 
that  the  individuals  in  each  environment  tend  to  agree,  which  is  reflected  by 
the  small  within-environment  variances  (.211  and  .261).  However,  ICC(l)  is 
-.047,  which  is  regarded  as  .00  (Bartko,  1976).  This  low  ICC  is  clearly  an 
underestimate  of  true  agreement,  and  may  be  attributed  directly  to  the  essential 
absence  of  variation  among  the  aggregate  climate  scores  (3.00  and  3.05). 


! 

i 
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Insert  Table  1  about  here 


Data  such  as  presented  in  Table  1  stimulate  the  following  question:  Why  j 

t 

should  the  level  of  agreement  within  an  environment  be  contingent  on  differences 
among  environments?  That  is,  in  its  most  direct  form,  interrater  reliability 
and  agreement  address  the  question  of  whether  people  in  a  particular  environment, 
or  people  in  each  one  of  a  set  of  environments,  agree  with  respect  to  their 
perceptions.  This  question  neither  assumes  nor  requires  that  differences  exist 
among  environments.  Of  course,  if  environments  were  sampled  randomly  from  a 
heterogeneous  population  of  environments  in  which  mean  climate  perceptions  were 
expected  to  vary,  then  we  would  not  anticipate  a  restriction  of  range  problem 
such  as  illustrated  in  Table  1.  This  point  is  discussed  below.  It  is  also  note¬ 
worthy  that  if  the  level  of  agreement  varies  as  a  function  of  environment  (i.e., 
the  level  of  agreement  is  not  the  same  or  similar  across  environments),  then 
the  ANOVA-based  ICC(l)  approach  cannot  be  used  because  the  homogeneity  of 
variance  assumption  is  violated.  Thus,  even  if  one  wanted  to  include  between- 
environment  differences  in  an  interrater  reliability  estimate,  one  could  not  do 
so  legitimately. 

In  sum,  use  of  a  statistic  such  as  the  ICC(l)  that  relies  on  between- 


environment  differences  will  result  in  an  underestimate  of  interrater  reliability 
(agreement)  if  the  following  conditions  exist:  (a)  mean  climate  scores  do  not 
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vary  meaningfully  among  environments,  and  (b)  individuals  within  environments 
tend  to  agree. 

A  case  can  be  built  that  these  two  conditions  apply  to  at  least  some  climate 
studies.  The  case  for  low  variation  among  mean  climate  scores  over  a  set  of 
jC  environments  is  predicated  on  the  fact  that  many  climate  studies  have  employed 
samples  of  environments  from  the  same  basic  system  or,  more  typically,  subsystem 
type.  It  is  not  uncommon  to  find  the  sample  in  a  particular  study  limited  to 
banks,  to  classrooms,  to  dormitories',  to  hospitals,  to  life  insurance  agencies, 
or  to  divisions  aboard  Navy  ships.  Now  consider  the  possibility  that  variation 
among  mean  climate  scores  is  likely  to  be  restricted  if  all  environments  in  a 
sample  are  of  the  same  or  similar  basic  type,  regardless  of  whether  the  environment 
were  randomly  sampled  from  within  this  type.  That  is,  sampling  of  environments 
from  a  homogeneous  type  of  environment,  in  relation  to  a  more  heterogeneous 
population  of  environmental  types,  is  likely  to  lead  to  restricted  variances  on 
situational  attributes  believed  to  be  causes  of  climate  perceptions,  such  as 
technology,  structure,  systems  norms  and  values,  and  processes  (e.g.,  communication 
leadership,  and  rewards).  It  follows  that  if  (a)  individuals'  climate  perceptions 
are  a  (partial)  function  of  situational  attributes,  and  if  (b)  sampling  from  a 
homogeneous  environmental  type  results  in  restricted  ranges  on  situational 
attributes,  then  (c)  the  range  should  also  be  restricted  on  individuals' 
perceptions,  and,  therefore,  means  of  individuals'  perceptions. 

The  case  for  low  within-environment  variation  among  individuals'  perceptions 
is  based  in  part  on  the  argument  above  and  in  part  on  a  recent  report  by 
Schneider  (in  press).  Range  restriction  in  regard  to  the  type  of  environment 
studied  suggests  similarity  of  perceptions  because  of  similarity  in  situational 
stimuli.  However,  similarity  of  stimuli  is  not  sufficient  to  guarantee  low 
within-environment  variation  in  perceptions.  My  colleagues  and  T  have  argued  on 
a  number  of  occasions  that  individuals  with  different  cognitive  construction 
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competencies,  encoding  abilities,  self-regulatory  systems,  beliefs,  needs, 
values,  and  self-concepts  will  be  predisposed  to  differ  in  what  they  perceive  as 
ambiguous,  challenging,  fair,  friendly,  supportive,  and  so  forth  (cf.  James, 

Hater,  Gent,  &  Bruni,  1978;  James  &  Sells,  1981).  That  is,  psychological  climates 
associated  with  the  same  or  similar  actual  environments  are  likely  to  differ 
for  different  types  of  individuals,  and  the  reasons  for  these  differences  are 
not  only  psychologically  important,  but  also  they  can  be  reliably  measured  and 
related  to  climate  perceptions  (cf.  James,  Gent,  Hater,  &  Coray,  1979;  James, 
Hater,  &  Jones,  1981;  James  &  Jones,  1980). 

On  the  other  hand,  if,  as  Schneider  (in  press)  suggests,  the  environments 
in  question  are  composed  of  similar  types  of  individuals,  then  1  agree  with 
his  conclusion  that  the  likelihood  of  variation  in  perceptions  due  to  individual 
differences  is  reduced.  If  placement  in  a  particular  job,  office,  position,  or 
role  is  subject  to  rigorous  selection  standards  that  relate,  directly  or 
indirectly,  to  cognitive  information  processing  competencies  and  predispositions 
(e.g.,  achievement  motivation,  cognitive  complexity,  intelligence,  perceived 
competence,  and  self-esteem),  then  the  resulting  relative  similarity  among 
individuals  suggests  a  relative  similarity  among  perceptions  of  climate.  Of 
perhaps  equal  importance  is  the  degree  to  which  individuals  with  relative 
similarities  in  attributes  not  necessarily  related  to  formal  selection  processes 
(e.g.,  cosmopolitan  vs.  local  orientation,  expectancies,  locus  of  control,  need 
for  affiliation)  are  attracted  to  (self-select)  a  particular  job,  office, 
position,  or  role.  Here  again,  relative  similarity  in  individual  attributes 
suggests  relative  similarity  in  perceptions.  Furthermore,  relative  similarity 
among  individuals  resulting  from  formal  and/or  self-selection  processes 
generates  forces  toward  perceptual  agreement  because  (a)  environments  tend  to 
be  shaped  to  fit  the  type  of  individuals  who  select,  and  are  selected,  to  work 
in  them,  which  implies  similarity  of  within-environmental  stimuli  for  similar 


9 

Organizational  Climate 

types  of  individuals  (cf.  Endler  &  Magnusson,  1976;  James  et  al.,  1978);  and 
(b)  the  meaning  imputed  to  an  environment  by  an  individual  is  more  likely  to  be 
socially  influenced  by  other  individuals  in  that  environment  if  the  perceiver 
views  the  others  as  similar  to  himself /herself  than  if  the  others  are  viewed  as 
different  (cf.  Stotland  &  Canon,  1972). 

In  summary,  the  following  two  situations  appear  to  be  conducive  to  under¬ 
estimation  of  interrater  reliability/agreement  when  estimation  is  based  on  an 
ANOVA  design  and  the  ICC(l) . 

1)  Sampling  of  homogeneous  environments,  which  implies  restriction  of  range 
in  the  types  of  situational  stimuli  perceived  in  each  environment  and  a  similarity 
of  stimuli  across  environments. 

2)  Similar  types  of  individuals  within  homogeneous  environments,  resulting 
from  rigorous  formal  selection  processes  and/or  self-selection  processes. 

Similarity  among  individuals  implies,  in  a  relative  sense,  a  narrow  range  of 
individual  differences  in  cognitive  information  processing  competencies  and 
predispositions.  This  in  turn  suggests  relative  similarities  in  the  psycholog¬ 
ical  meaning  and  significance  imputed  to  environments  (i.e.,  similar  psychological 
climates).  It  also  suggests  similar  shaping  of  environmental  stimuli  and  social 
influence  processes. 

These  two  situations  are  conducive  to  the  statistical  conditions  of  low 
within-environment  variation  resulting  from  similar  types  of  individuals  perceiv¬ 
ing  similar  types  of  stimuli,  and  low  between-environment  variation  in  mean 
(aggregate)  climate  scores.  An  alternative  to  the  ICC (1 )  approach  is  indicated 
for  estimating  interrater  rel iabil ity/agreement  if  these  two  statistical  conditions 
are  operative,  or  perhaps  even  partially  operative.  Such  an  alternative  was 
proposed  by  James  et  al.  (Note  1),  and  is  reviewed  below. 
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An  Overview  of  a  New  Method  for  Estimating  Interrater  Reliability  in  Climate 
Studies 

The  proposed  procedure  was  based  on  prior  work  by  Finn  (1970)  and  Cooper 
(1976),  and  employs  a  within-group  design  in  which  interrater  reliability  is 
estimated  separately  for  each  group  (i.e.,  environment).  For  each  group,  inter¬ 
rater  reliability  is  defined  as  the  degree  to  which  raters  (perceivers)  agree 
with  respect  to  their  ratings  (perceptions)  of  a  particular  target  (e.g.,  the 
organization)  on  a  particular  rating  (climate)  scale  (e.g.,  the  equity  of  an 
organization's  pay  and  benefit  system).  A  within-group  design  is  used  because 
we  desire  an  estimate  of  interrater  reliability  for  each  group  that  is  not  a 
function  of  between-group  variation.  Thus,  the  estimate  will  not  be  affected 
by  lack  of  variation  in  group  means.  Furthermore,  lack  of  homogeneity  of  within- 
group  variation  is  not  a  concern  inasmuch  as  a  separate  estimate  of  reliability 
is  computed  for  each  group.  Consequently,  agreement  may  vary  as  a  function  of 
environment  and  we  may  still  estimate  agreement  for  each  group. 

The  proposed  procedure  views  interrater  reliability  (agreement)  within  a 

group  as  a  function  of  two  variances,  namely  (a)  the  observed  variance  among  the 

2 

ratings  on  a  climate  item  X,  designated  s^,  ,  and  (b)  the  expected  variance  among 

2 

the  ratings  on  climate  item  X  in  a  condition  of  no  agreement ,  designated  o ^ . 

2  2 
An  s  =0  indicates  perfect  agreement;  however,  s  is  not  usually  equal  to 
A  A 

zero  and  thus  we  must  ascertain  the  degree  to  which  raters  in  a  group  agreed. 

2  2  2 

This  is  accomplished  by  comparing  s^,  to  c^,  t where  is  the  variance  on  item  X 

that  would  be  expected  if  raters  responded  randomly,  which  implies  zero  inter- 

2 

rater  reliability  and  no  agreement  (cf.  Finn,  1970).  Thus,  o  functions  as  a 

statistical  benchmark  for  random  responsing  and  absence  of  agreement.  It  follows 

2  2 

that  (a)  the  value  of  the  proportion  indicated  by  sv  /a  *"  reflects  the  amount  of 

A  h 

2  2 

random  error  variance  in  the  observed  ratings,  and  (b)  1  -  (s^  /c^.  )  is  a 
reliability  coefficient  because  it  indicates  the  proportion  of  nonerror 


* 
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variance  in  the  observed  ratings  (Finn,  1970;  James  et  al . ,  Note  1). 

2 

It  is  important  to  note  that  o  is  a  statistical  abstraction.  Whether 

t 

raters  in  a  particular  group  would  ever  respond  in  a  sheerly  random  fashion  is 

irrelevant  to  the  use  of  hypothetical  random  responding  as  a  statistical 

benchmark  for  assessing  the  extent  to  which  the  variance  of  a  set  of  actual 

2 

responses,  indicated  by  sv  ,  resembles  the  expected  variance  ol  a  set  of  random 

A 

2 

responses,  indicated  bv  a  .  The  assumption  of  random  responding  also  provides 

h 

a  simple  method  for  computing  a  Random  responding  implies  that  each  alternative 

Cj 

on  the  rating  scale  for  item  X  has  an  equal  likelihood  of  response.  This  in  turn 

implies  that  the  distribution  of  scores  over  alternatives  is  rectangular  or 

2 

uniform.  Consequently,  o^,  may  be  calculated  using  the  equation  for  the  variance 

h 

2  2 

of  a  discrete,  uniform  distribution.  This  equation  is:  o£  =  (A  -1)/12,  where 

2 

A  corresponds  to  the  number  of  discrete  alternatives  on  item  X.  is  a 

population  parameter,  and  thus  sample  size  does  not  enter  into  its  calculation. 

In  summary,  building  on  prior  work  by  Finn  (1970)  and  Cooper  (1976),  James 
et  al.  (Note  1)  derived  the  following  equation  for  estimating  interrater 
reliability  for  a  single  group  of  individuals  on  a  single  item. 

fuG  •  1  '  E2>  (1) 

where: 

r  =  within-group  interrater  reliability  for  a  single  group 
WCj 

of  raters  on  a  single  item  X. 

2 

s  =  the  observed  variance  on  item  X  in  the  group.  Assumptions 

A 

2  2 

associated  with  sv  (and  o_,  )  are  that  raters  responded 
A  _ h 

independently  (this  does  not  preclude  prior  social  influence 
processes),  and  that  X  is  a  discrete  random  variable  with 
multiple  alternatives  arranged  on  an  approximately  interval 
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scale  (such  as  a  Likert  item — cf.  Cooper,  1976). 

2 

o  =  the  variance  on  item  X  that  would  be  expected  if  the  raters 
E 

responded  randomly,  which  implies  zero  interrater  relia- 

2 

bility  and  no  agreement,  o^.  is  calculated  by  the  equation 
2 

(A  — 1 ) / 1 2 ,  where  A  is  the  number  of  alternatives  on  item  )< 

(the  scale  on  X  is  1,...,A). 

2 

Equation  1  is  easily  interpreted.  If  s^  =  0,  then  r^G  =  1.0;  that  is,  no 

variance  on  X  results  in  perfect  interrater  reliability  (agreement).  Conversely, 

2  2 

if  raters  were  to  respond  randomly,  then  s^  2  0g  ’  anc*  ryQ  s  tXP^-ca^ 

2  2  2 
situation  in  research  is  0  <  s^  <  o^,  .  Equation  1  indicates  that  as  s^ 

2  2 
approaches  o  ,  interrater  reliability  decreases,  and  as  sv  becomes  progressively 
E  _ A 

2 

smaller  than  o  ,  interrater  reliability  increases. 

The  use  of  Eq.  1  is  illustrated  by  application  to  the  data  in  Table  1.  Item 

2  2 

X  has  five  alternatives,  or  A  =  5,  and  thus  is  equal  to  (5  -  1)/12,  or  2.0, 

2 

in  each  of  the  two  groups.  The  observed  variance  (s^  )  on  X  in  Group  1  is  .211, 

and  the  estimate  of  provided  by  Eq.  1  is  .89  [i.e.,  1  -  (.211/2.0)].  Using 

similar  procedures,  the  estimate  of  rWG  in  Group  2  is  1  -  (.261/2.0),  or  .87. 

Given  the  similarity  of  these  two  estimates  (and  the  observed  variances),  the 

estimates  were  averaged  to  furnish  a  value  of  .88.  The  value  of  .88  is  obviously 

different  than  the  ICC(l)  of  .00,  and  it  is  equally  obvious  that  each  r,„.  am!  the 
-  WO 

average  rtf„  are  more  consistent  with  the  data  than  the  1CC(1). 

WO 

It  should  be  noted  that  averaging  the  separate  rWG  across  groups  is  not 

recommended  if  the  r  are  dissimilar.  A  homogeneity  of  variance  test  on 

WO 


observed  variances  (i.e.,  the  s^  s)  assists  in  ascertaining  whether  to  average 


r.,_s  in  nonobvious  situations. 
wG 
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Interrater  reliability  for  composite  scores.  Data  employed  in  climate 
studies  are  often  based  on  a  composite  score  rather  than  a  single  item.  For 
example,  each  member  of  a  workgroup  rates  that  group  on  a  set  of  items  designed 
to  measure  workgroup  cooperativeness.  A  composite  score  is  then  calculated  for 
each  rater  by  computing  a  sum  or  a  mean  over  the  items,  and  it  is  these  scores 
that  are  entered  into  the  within-group  interrater  reliability  (agreement) 
analysis.  Based  on  rationale  by  Finn  (1970),  James  et  al.  (Note  1)  derived  an 
equation  for  estimating  the  interrater  reliability  among  raters'  composite  scores 
on  a  set  of  J  (1=1 .... , J_)  items  in  a  single  group.  The  derivation  was  based  on 
the  assumptions  that  (a)  the  J  items  represent  a  random  sample  of  items  from  a 
single,  well-defined  domain  of  items  (cf.  Lord  &  Novick,  1968);  (b)  the  raters 
in  each  group  are  randomly  sampled  from  a  population  of  raters  to  which 
inferences  will  be  made  (which  allows  the  population  of  raters  to  be  homogeneous) , 
and  (c)  the  item  variances  and  interitem  covariances  are  equal,  respectively,  in 
the  rater  population,  which  implies  that  the  items  are  "essentially  parallel" 
indicators  of  the  same  construct. 

An  example  of  the  design  in  question  is  presented  in  Section  A  of  Table  2. 

The  data  represent  ratings  (i.e.,  item  responses)  provided  by  six  raters 
(1=1, . . . ,n^=l, . . . ,6)  on  four  essentially  parallel  items  that  measure  the  same 

climate  dimension.  Each  of  the  four  items  (J=4)  employs  the  same  seven  discrete, 
approximately  equally  spaced  alternatives  (A=7) . 


Insert  Table  2  about  here 


The  generally  recommended  statistical  procedure  for  estimating  interrater 
reliability  for  multiple  ratings  in  a  within-group  design  should  not  be  used 
here.  As  shown  in  Section  B  of  Table  2,  the  within-group  ICC  is  approximately 
.00  [equation  for  ICC  (2,1),  Shrout  &  Fleiss,  1979].  This  is  due  to  the  fact 
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that  the  items  have  essentially  identical  means,  from  which  it  follows  that  the 
between-item  mean  square  is  close  to  zero.  The  within-group  ICC  can  only 
assume  high  values  when  between-item  variance  is  larger  than  within-item 
variance.  Given  essentially  parallel  items,  this  is  not  likely  to  be  the  case, 
and  the  within-group  ICC  underestimates  interrater  reliability. 

The  procedure  described  by  James  et  al.  (Note  1)  is  designed  to  estimate 
interrater  reliability  among  rater  composite  scores  in  the  form  of  means, 
designated  X^.  The  are  displayed  at  the  bottom  of  the  data  matrix  in  Section 

A  of  Table  2.  The  estimating  equation  takes  a  number  of  forms;  the  most  direct 
for  computing  purposes  is  as  follows: 


J[1  -  (s^/o/)] 


rWG(X.) 


J[1  -  (s*  /oE2)]  +  (s2./oE2) 


where: 


(2) 


rWG(X1) 


within-group  interrater  reliability  for  mean  rater 
scores  (the  X^)  on  £  essentially  parallel  items. 

the  mean  of  the  observed  variances  on  the  J  items — 
it  is  assumed  here  that  each  of  the  J  items  employs  the 
same  seven  alternatives. 

same  definition  as  before,  namely  the  expected  variance 
of  an  item  in  a  condition  of  zero  interrater 

reliability  and  no  agreement.  Technically,  the  mean 
2  2 

o  ,  or  o„  ,  should  be  used  in  Eq.  2,  but  with  A=7  for 
fc.  h 


all 


items. 


rWG(Xt) 


The  use  of 
v  „  is  .98, 


Eq.  2  is  illustrated  in  Section  C  of  Table  2.  The  estimate  of 
which  contrasts  sharply  with  the  within-group  ICC  of  .00.  It 
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is  also  clear  that  an  interrater  reliability  of  .98  is  a  more  accurate  reflection 
of  the  data  than  a  .00.  To  be  fair  here,  one  could  argue  that  the  within-group 
ICC  is  low  because  the  items  were  not  sampled  randomly  from  a  heterogeneous 
population  of  items,  thus  violating  the  implicit  ANOVA  assumption  of  variation 
among  items  means.  The  within-group  ICC  was  included  only  to  demonstrate  its 
inapplicability. 

Equation  2,  like  Eq.  1,  may  be  applied  in  each  of  K  groups,  and  the  resulting 
rWG(X  )  may  avera8e^  over  the  K  groups  if  the  separate  rWG^.)  are  s*m:i-lar> 

Homogeneity  of  variance  tests  on  the  mean  item  variances  over  the  K  groups  might 
be  employed  to  help  to  decide  whether  to  average  the  r  .  .  Finally,  it  is 

suggested  that  if  the  decision  is  to  average,  and  the  n^  differ,  there  would  be 


little  reason  to  weight  the  rWG^ .)  nk.  because  the  rwc(X  ) 


should  be  similar. 


This  applies  also  to  averaging  r 


WG ' 


In  summary,  the  discussions  above  summarize  the  use  of  rWG  and  rWG(x.)' 
Statistical  derivations  and  discussions  of  potential  criticisms  of  the  procedures 


are  presented  in  James  et  al.  (Note  1).  It  is  noted  here  that  very  small  n. 


(e.g.,  less  than  10  ir  Lviduals  in  a  group)  may  lead  to  unstable  results,  and 

that  very  short  (e.g.,  A  <_  3)  or  very  long  (e.g.,  A  >  9)  item  scales  may 

produce  artifical  results.  Additional  points  developed  more  fully  in  the  James 

et  al.  paper  are  (a)  although  the  theoretical  distribution  on  an  item  X  may  be 

normal  (Hsu,  1979;  Selvage,  1976),  a  rectangular  (uniform)  distribution  should 

2 

be  used  to  calculate  o_  because  the  rectangular  distribution,  and  not  the  normal 

L 

distribution,  models  random  responses  (the  normal  distribution  models  partial 

2 

agreement  because  of  central  tendency) ;  (b)  the  calculation  of  o^,  may  be  based 


on  an  assumed  underlying  continuous,  rather  than  discrete,  distribution  by 
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using  (A-l)^/12  to  calculate  o ^  (Selvage,  1976);  (c)  like  ICC(l),  the 


estimates  of  and  r, ,  _ .  are  biased,  but  the  bias  is  expected  to  be  minimal 
WG  WG(.X^) 


for  small  n^  and  essentially  neglible  for  large  n^; 


r„_  and  r,t„,—  .  can  assume  values  of  less  than  .00, 
WG  WG(X^) 


and  (d)  also  like  ICC(l) , 
in  which  case  the  value  is 


set  equal  to  .00  because  all  observed  distributions  on  an  item  X  that  result  in 
negative  values  are  due  to  serious  degrees  of  disagreement  [the  same  recommenda¬ 
tion  was  made  for  ICC(l)  by  Bartko,  1976]. 

Empirical  Comparison  of  Between-Group  and  Within-Group  Approaches 


The  data  employed  in  this  illustration  were  collected  by  David  W.  Bracken 
as  part  of  a  dissertation  project  at  the  Georgia  Institute  of  Technology,  and 
loaned  to  the  present  investigator  to  demonstrate  statistical  procedures.  The 
data  met  the  two  situations  and  statistical  conditions  discussed  earlier  in 
which  an  ICC(l)  procedure  would  be  expected  to  provide  an  underestimate  of 
interrater  reliability.  Statistical  conditions  are  discussed  shortly.  Of 
initial  concern  is  that  Situation  1  was  satisfied  inasmuch  as  all  environments 
were  of  the  same  organization  subtype.  The  environmental  sample  consisted  of 
field  offices  of  a  large  business  machines  company.  Each  office  (a)  operated 
as  a  self-contained  subsystem;  (b)  performed  the  same  functions,  namely  marketing, 
installing,  and  servicing  small  business  machines;  and  (c)  had  the  same  hierarch¬ 
ical/functional  differentiation,  where  the  staff  consisted  of  managerial  personnel, 
marketing  personnel,  supervisors,  technicians  (see  below),  and  clerical 
personnel.  All  offices  were  located  in  the  United  States,  with  the  exception  of 
one  location  in  Puerto  Rico.  The  offices  varied  in  size,  but  size  was  not 
related  to  the  data  of  interest  here. 

Situation  2  refers  to  relative  homogeneity  of  within-of f ice  variance  on 
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individual  difference  variables  that  could  influence  scores  on  climate 
variables.  This  situation  was  partially  satisfied  in  the  following  manner.  The 
parent  corporation  supported  a  study  designed  to  ascertain  if  climate  moderated 
relationships  between  scores  on  selection  tests  and  performance.  The  study 
focused  exclusively  on  the  position  of  technician,  which  is  similarly  described 
for  all  offices  as  installing  and  servicing  business  machines.  For  the  present 
study,  the  relative  homogeneity  of  variance  on  individual  difference  variables 
for  technicians  was  demonstrated  by  comparing  selection  test  data  to  published 
norms  in  test  manuals,  where  the  most  heterogeneous  norm  samples — high  school 
students — were  selected  for  comparison  purposes.  The  tests  included  the  Bennett 
Test  of  Mechanical  Comprehension  and  the  Gordon  Personal  Inventory  and  Personal 
Profile.  The  personality  data  were  of  primary  interest  here  because  personality 
comprises  a  salient  basis  for  predispositions  toward  assignment  of  meaning  in 
higher-order  cognitive  processing  (cf.  James  &  Jones,  1980;  James  et  al.,  1978; 
James  et  al.,  1979;  Jones  &  Gerard,  1967;  Kim,  1980;  Mischel,  1973;  Stotland  & 
Canon,  1972). 

Test  data  were  available  on  approximately  2,800  technicians;  all  offices 
(K=87)  were  represented  in  this  sample.  These  data  were  based  on  test  results 
obtained  since  1975,  the  time  at  which  the  parent  corporation  initiated  a 
formal  reporting  program.  Company  personnel  regarded  these  data  as  representative 
of  most  technicians  because  the  tests  had  been  used  in  the  same  fashion  for  a 
number  of  years  prior  to  the  initiation  of  the  reporting  program.  While  it  was 
not  possible  to  conf irm/disconf irm  this  assumption  empirically,  the  results  for 
interrater  reliability  on  climate  perceptions  reported  shortly  suggest  that  the 
assumption  is  valid. 

The  relative  homogeneity  of  variance  on  individual  difference  variables  is 
indicated  by  the  statistics  reported  in  Table  3.  These  results  demonstrate 
that  (a)  the  means  on  test  scores  for  the  technician  sample  were,  with  one 
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exception  (sociability),  far  above  the  averages  of  the  norm  samples,  as 
indicated  by  percentile  ranks  ranging  from  67  to  95;  and  (b)  on  the  average, 
the  variances  of  test  scores  on  the  technician  sample  were  approximately  one- 
half  as  large  as  those  on  the  norm  samples.  Furthermore,  as  shown  in  column 
three  of  Table  3,  no  meaningful  variation  in  technicians'  test  scores  was 
associated  with  differences  among  offices,  which  is  indicated  by  the  small  eta- 
squares  (eta-squares  were  based  on  one-way  ANOVAs  using  office  as  the  independent 
variable).  These  results  suggest  that  variation  on  individual  difference 
variables  was  relatively  restricted  for  the  technician  sample  and  that  technicians 
were,  on  the  average,  similar  across  offices.  This  does  not  imply  that  all 
technicians  were  the  same  or  reported  the  same  climate  perceptions;  for  example, 
sufficient  variation  remained  to  conduct  a  principal  components  analysis  on  the 
climate  scores  for  the  technician  sample. 


Insert  Table  3  about  here 


The  sample  of  technicians  employed  in  the  interrater  reliability  analyses 
on  climate  perceptions  consisted  of  7,180  individuals  from  60  offices.  (These 
data  were  collected  in  the  first  phase  of  the  study;  data  collected  in  later 
phases  were  essentially  identical  to  those  reported  here.)  The  technicians 
completed  a  102  item  climate  questionnaire,  developed  specifically  for  technicians, 
in  1980.  Principal  components  analysis  furnished  13  components  that  were  inter¬ 
pretable  as  cognitive  representations  of  the  work  environment  and  had  scale 
reliabilities  of  .70  or  greater  (coefficient  alpha).  Abbreviated  designations 
of  11  of  the  climate  dimensions  for  which  "office"  was  a  potentially  appropriate 
level  of  explanation  are  presented  in  Table  4.  The  remaining  two  climate 
dimensions  (supervisor  support,  workgroup  cooperativeness)  are  not  included 
because  different  levels  of  explanation  were  indicated  (e.g.,  different 
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supervisors  and  workgroups  within  a  particular  office).  Also  shown  in  Table 
4  is  the  number  of  items  per  climate  dimension,  dimension  internal  consistency 
estimates  (coefficient  alpha),  estimates  of  interrater  reliability  furnished  by 
an  1CC(1)  approach,  the  variance  of  the  mean  climate  scores  for  the 
60  offices,  the  average  within-of f ice  variance  on  each  climate  dimension,  and 
estimates  of  interrater  reliability  supplied  by  rWQ^  ^ »  which  are  reported  in 


terms  of  the  range  of  estimates  for  the  60  offices  and  the  percent  of  offices 
for  which  the  estimate  was  .70  or  above. 


Insert  Table  4  about  here 


The  intraclass  correlations  were  computed  by  first  calculating  composite 
(mean)  scores  on  each  of  the  11  climate  dimensions  for  each  technician.  These 
scores  were  based  on  a  mean  of  the  scores  on  the  items  that  loaded  on  each  climate 
component.*  For  each  climate  dimension,  a  random  effects,  one-way  ANOVA  was 
conducted  to  obtain  estimates  of  the  between-of f ice  and  within-of f ice  mean 
squares.  The  ICC(l)  equation  was  then  employed  to  estimate  interrater  reliability, 
using  a  harmonic  mean  of  75.05  for  office  size  (office  size  ranged  from  5  to 
215).  As  shown  in  the  ICC(l)  column  in  Table  4,  the  estimates  of  interrater 
reliability  were  uniformly  low,  raiging  from  .01  to  .10.  These  results  are 
generally  consistent  with  prior  climate  studies,  albeit  the  ICC(l)s  are  on  the 
low  side. 

An  explanation  for  these  low  ICC(l)s  is  furnished  in  columns  4  and  5  of 
Table  4.  A  mean  (aggregate)  of  the  technicians'  climate  scores  was  computed  for 
each  climate  dimension  for  each  office,  thus  furnishing  60  mean  office  climate 
scores  for  each  climate  dimension.  The  variance  of  the  means  for  each  climate 
dimension  on  the  sample  of  60  offices  is  reported  in  Column  4.  The  range  of 
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possible  mean  office  scores  is  1  to  5.  The  range  of  the  11  variances  is  .01 
to  .07,  which  on  a  five-point  scale  suggests  restriction  of  range  on  the  mean 
climate  scores  for  offices.  These  results  support  the  prior  contention  of 
homogeneity  of  the  environmental  sample.  The  average  within-of f ice  variances 
on  technicians'  climate  composite  (mean)  scores  are  presented  in  column  5.  The 
values  reflect  the  variation  of  the  climate  composite  scores  in  an  office  about 
the  mean  for  that  office,  averaged  over  the  60  offices.  Again  using  a  scale  of 
1  to  5,  these  variances  generally  reflect  low  within-of f ice  variation  on  the 
climate  composite  scores  (exceptions  occurred  for  dimensions  2  and  11). 

The  data  shown  in  columns  4  and  5  of  Table  4  are  consistent  with  the  two 
statistical  conditions  (low  variation  among  office  means  and  low  within-of f ice 
variation  in  technicians'  climate  scores)  that  suggest  a  low  ICC(l).  Low  within- 
office  variation  implies  further  that  the  interrater  reliabilities  based  on 
r  -v  should  be  substantially  higher  than  those  based  on  ICC(l).  The  estimates 

of  r  .  were  based  on  applications  of  Eq.  2  in  each  of  the  60  offices  for  each 

of  the  climate  dimensions.  The  f°r  each  climate  dimension  differed  some 

what  across  the  60  offices,  and  thus  ranges  and  the  percent  of  estimates  greater 
than  or  equal  to  .70  are  reported  in  columns  6  and  7  of  Table  4.  The  estimates 
furnished  by  this  analysis  were  substantially  higher  than  the  estimates  provided 
by  the  ICC(l)  procedure.  For  example,  at  least  88%  of  the  60  offices  had  ^ 


of  .70  or  greater  on  9  of  the  11  climate  dimensions.  Moreover,  even  for  the 
remaining  two  dimensions  (dimensions  2  and  11),  not  only  did  some  offices  (22% 
and  65%,  respectively)  have  )s  —  ^ut  a^so  lowest  value  in  the 

range  of  ryQ(^  )s  exceeded  ICC(l).  It  should  also  be  mentioned  that  the  values 
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of  j  and  the  percent  of  r^^—  )s  *70  tended  to  be  larger  for  the  climate 

dimensions  with  the  larger  number  of  items  (dimensions  1,  3,  and  10).  This  is 
a  result  of  the  fact  that  r^, ^  a  functi°n  of  J_,  the  number  of  items  in  a 

composite  (see  Eq.  2).  A  substantive  interpretation  of  this  function  is  that 

the  mean  score  per  rater  (i.e.,  X^)  will  contain  less  random  measurement  error 

as  the  number  of  essentially  parallel  items  in  a  composite  increases.  Thus,  the 

estimate  of  interrater  reliability  among  composite  scores  will  be  less  influenced 

by  random  error  in  the  original  item  measurements  as  the  number  of  essentially 

parallel  items  in  the  composite  increases.  On  the  other  hand,  inspection  of 

~~2 

Eq.  2  shows  that  a  large  J  does  not  guarantee  a  large  r  .  (e.g.,  if  s  ~ 

WCj^X^J  X  j 

the  rWG(5c.)  S  0)- 

In  summary,  the  results  for  the  rWG^  ^  analysis,  in  contrast  to  the  results 

for  the  ICC(l)  analysis,  suggest  that  the  technicians  in  most  offices  tended  to 
agree  with  respect  to  their  climate  perceptions  on  9  out  of  a  possible  11  climate 
dimensions.  Consequently,  mean  or  aggregate  climate  scores  for  technicians  could 
be  calculated  for  9  climate  dimensions  in  almost  all  offices,  and  used  to  describe 


the  shared  psychological  environment  (climate)  among  technicians  in  that  office. 


Conclusions 

A  conclusion  that  is  not  warranted  by  the  discussion  and  illustration 
is  that  all  prior  climate  studies  that  employed  between-group  designs  reported 
underestimates  of  interrater  reliability  (perceptual  agreement).  For  example, 
given  homogeneity  of  within-group  variance  and  moderate  to  large  between-group 
differences  in  group  mean  climate  scores,  the  ICC(l)  can  assume  high  and 
reasonably  accurate  estimates  of  interrater  reliability  (James  et  al.t  Note  1). 

The  primary  problem  occurs  when  between-group  differences  are  small  and  within- 
group  variation  is  low,  which  is  most  likely  to  occur  when  groups  are  sampled 
from  the  same  environmental  type  or  subtype  and  the  range  on  individual  difference 


22 

Organizational  Climate 


variables  is  restricted  due  to  formal  and  self-selection  procedures.  Inasmuch 
as  many  climate  studies  employ  at  least  samples  of  environments  from  the  same 
environmental  type,  the  potential  for  underestimates  of  interrater  reliability 
is  present.  Unfortunately,  published  studies  do  not  furnish  sufficient  data 
for  reanalysis  using  the  methods  described  here. 

It  is  strongly  recommended  that  climate  researchers  reexamine  their  data  in 
light  of  the  substantive  and  statistical  points  made  in  this  paper.  I  believe 
that  it  is  reasonable  to  assume  that  such  reexamination  will  lead  to  different 
conclusions  for  at  least  some  studies.  That  is,  if  the  empirical  illustration 
reported  here  is  generalizable,  then  it  is  quite  possible  that  estimates  of 
interrater  reliability  will  be  higher,  perhaps  much  higher,  than  those  reported 
previously.  On  the  other  hand,  the  empirical  illustration  may  be  idiosyncratic 
and  not  generalizable.  One  could  argue  that  the  estimates  of  interrater 
reliability  were  higher  than  would  generally  be  expected  because  only  individuals 
from  the  same  position  (i.e.,  technician)  were  included  in  the  analysis.  I  will 
attempt  to  counter  this  argument  by  suggesting  that  individuals  in  different 
positions  in  an  organization  are  likely  (a)  to  experience  different  situational 
stimuli,  which  contributes  to  different  perceptions  of  climate  (cf.  Newman,  1975), 
and  (b)  to  vary  in  regard  to  individual  variables  which  affect  the  meaning 
assigned  to  situational  stimuli.  The  latter  concern  is  viewed  as  a  function  of 
the  formal  selection  and  self-selection  processes  discussed  earlier,  and  as  a 
function  of  experience  in  the  organization  (e.g.,  increases  in  self-esteem 
resulting  from  promotions).  This  suggests  that  we  should  consider  position,  or 
perhaps  families  of  similar  positions,  as  a  key  variable  on  which  to  base 
agreement  analyses.  This,  of  course,  is  an  empirical  question  that  can  be 
addressed  in  future  research. 

In  conclusion,  it  is  submitted  that  estimates  of  interrater  reliability 
equal  to  or  above  .70  should  not  be  all  that  uncommon  in  climate  research  if 
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(a)  the  data  have  satisfactory  psychometric  properties  (e.g.,  high  scale 
reliabilities),  (b)  attention  is  given  to  the  appropriate  level  of  explanation 
of  the  climate  variable,  which  operationally  means  that  before  one  computes  an 
interrater  reliability,  he/she  should  be  reasonably  assured  that  subjects  were 
perceiving  the  same  set  of  events,  and  (c)  individuals  on  whom  estimates  of 
agreement  are  based  are  relatively  similar  in  regard  to  personalistic  variables 
that  relate  to  cognitive  information  processing  of  climate  perceptions.  It  is 
hypothesized,  therefore,  that  individuals  tend  to  agree  at  substantially  higher 
levels  than  reported  previously  in  the  climate  literature,  given  the  conditions 
specified.  If  this  hypothesis  is  confirmed,  then  the  concept  of  an  organizational 
climate,  or  perhaps  "position  climate",  is  alive  and  well,  although  we  may  wish 
to  adopt  a  different  descriptor  than  "organizational"  to  indicate  aggregate 
psychological  climate  perceptions.  Finally,  a  somewhat  obvious  but  nevertheless 
important  point  requires  mention.  If  the  environments  (positions)  in  a  sample 
are  indeed  homogeneous,  and  if  the  with in-environment  variation  on  e  climate 
variable  is  low,  then  it  follows  that  the  rWG(x ,)s  can  ^uite  high  but  the 


mean  climate  scores  will  not  relate  highly,  or  perhaps  even  moderately,  to  other 
environmental  variables  (e.g.,  structural  variables).  This  is,  of  course,  due 
to  the  restriction  of  range  on  the  mean  climate  scores,  and  most  likely  other 
environmental  variables.  Thus,  the  points  raised  here  regarding  the  effects 
of  restriction  of  range  on  interrater  reliability  estimates  such  as  ICC(l) 
extend  directly  to  relations  between  aggregate  climate  scores  and  other  variables. 
On  the  other  hand,  restriction  of  range  is  not  as  serious  for  climate  data  as 
it  may  be  for  other  variables  inasmuch  as  climate  data  often  serve  an 
important  diagnostic  function,  such  as  ascertaining  whether  individuals  in  an 
environment,  or  each  of  a  set  of  environments,  perceive  their  pay  and  benefit 


programs  as  fair  and  equitable. 
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Inasmuch  as  the  )  Pr°decure  does  not  weight  items  explicitly,  the 


items  were  not  weighted  explicitly  (e.g.,  component  scores)  in  the  calculation 


of  means.  This  provided  as  comparable  a  base  as  possible  for  contrasting 


reliability  estimates  provided  by  the  ICC(l)  and  rWG(x .)  aPProaches- 


NS  *  not  significant  at  £  <  .05 


aMMi 
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Table  2 


Illustrations  of  Within-Group  ICC  and  )  for 

a  Single  Group  of  Raters 
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B.  Within-Group  ICC 
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within-off ice  variance  of  technicians’  climate  composite  scores. 


